Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques
نویسندگان
چکیده
Hate speech on social media may spread quickly through online users and subsequently, even escalate into local vile violence heinous crimes. This paper proposes a hate detection model by means of machine learning text mining feature extraction techniques. In this study, the authors collected English-Odia code mixed data from Facebook public page manually organized them three classes. order to build binary ternary datasets, are further converted The modeling employs combination algorithm features extraction. Support vector (SVM), naïve Bayes (NB) random forest (RF) models were trained using whole dataset, with extracted based word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), n-grams weighted TF-IDF word2vec for both datasets. Using two we developed kinds each feature—binary models. SVM achieved better performance than NB RF categories. result reveals that less confusion between non-hate
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملPOS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments
We discuss Part-of-Speech(POS) tagging of Hindi-English Code-Mixed(CM) text from social media content. We propose extensions to the existing approaches, we also present a new feature set which addresses the transliteration problem inherent in social media. We achieve an 84% accuracy with the new feature set. We show that the context and joint modeling of language detection and POS tag layers do...
متن کاملDetecting Hate Speech in Social Media
In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy...
متن کاملDetecting the Hate Code on Social Media
Social media has become an indispensable part of the everyday lives of millions of people around the world. It provides a platform for expressing opinions and beliefs, communicated to a massive audience. However, this ease with which people can express themselves has also allowed for the large scale spread of propaganda and hate speech. To prevent violating the abuse policies of social media pl...
متن کاملShallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text
In this study, the problem of shallow parsing of Hindi-English code-mixed social media text (CSMT) has been addressed. We have annotated the data, developed a language identifier, a normalizer, a part-of-speech tagger and a shallow parser. To the best of our knowledge, we are the first to attempt shallow parsing on CSMT. The pipeline developed has been made available to the research community w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2021
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app11188575